---
displayed_sidebar: "English"
---

# Broker Load

## 1. Does Broker Load support re-running load jobs that have been run successfully and are in the FINISHED state?

Broker Load does not support re-running load jobs that have been run successfully and are in the FINISHED state. Also, to prevent data loss and duplication, Broker Load does not allow reusing the labels of successfully run load jobs. You can use [SHOW LOAD](../../sql-reference/sql-statements/data-manipulation/SHOW_LOAD.md) to view the history of load jobs and find the load job that you want to re-run. Then, you can copy the information of that load job and use the job information, except the label, to create another load job.

## 2. When I load data from HDFS by using Broker Load, what do I do if the date and time values loaded into the destination StarRocks table are 8 hours later than the date and time values from the source data file?

Both the destination StarRocks table and the Broker Load job are compiled at creation to use a China Standard Time (CST) time zone (specified by using the `timezone` parameter). However, the server is set to run based on a Coordinated Universal Time (UTC) time zone. As a result, 8 extra hours are added to the date and time values from the source data file during data loading. To prevent this issue, do not specify the `timezone` parameter when you create the destination StarRocks table.

## 3. When I load ORC-formatted data by using Broker Load, what do I do if the `ErrorMsg: type:ETL_RUN_FAIL; msg:Cannot cast '<slot 6>' from VARCHAR to ARRAY<VARCHAR(30)>` error occurs?

The source data file has different column names than the destination StarRocks table. In this situation, you must use the `SET` clause in the load statement to specify the column mapping between the file and the table. When executing the `SET` clause, StarRocks needs to perform a type inference, but it fails in invoking the [cast](../../sql-reference/sql-functions/cast.md) function to transform the source data to the destination data types. To resolve this issue, make sure that the source data file has the same column names as the destination StarRocks table. As such, the `SET` clause is not needed and therefore StarRocks does not need to invoke the cast function to perform data type conversions. Then the Broker Load job can be run successfully.

## 4. The Broker Load job does not report errors, but why am I unable to query the loaded data?

Broker Load is an asynchronous loading method. The load job may still fail even if the load statement does not return errors. After you run a Broker Load job, you can use [SHOW LOAD](../../sql-reference/sql-statements/data-manipulation/SHOW_LOAD.md) to view the result and `errmsg` of the load job. Then, you can modify the job configuration and retry.

## 5. What do I do if the "failed to send batch" or "TabletWriter add batch with unknown id" error occurs?

The amount of time taken to write the data exceeds the upper limit, causing a timeout error. To resolve this issue, modify the settings of the [session variable](../../reference/System_variable.md) `query_timeout` and the [BE configuration item](../../administration/BE_configuration.md#configure-be-static-parameters) `streaming_load_rpc_max_alive_time_sec` based on your business requirements.

## 6. What do I do if the "LOAD-RUN-FAIL; msg:OrcScannerAdapter::init_include_columns. col name = xxx not found" error occurs?

If you are loading Parquet- or ORC-formatted data, check whether the column names held in the first row of the source data file are the same as the column names of the destination StarRocks table.

```SQL
(tmp_c1,tmp_c2)
SET
(
   id=tmp_c2,
   name=tmp_c1
)
```

The preceding example maps the `tmp_c1` and `tmp_c2` columns of the source data file onto the `name` and `id`  columns of the destination StarRocks table, respectively. If you do not specify the `SET` clause, the column names specified in the `column_list` parameter are used to declare the column mapping. For more information, see [BROKER LOAD](../../sql-reference/sql-statements/data-manipulation/BROKER_LOAD.md).

> **NOTICE**
>
> If the source data file is an ORC-formatted file generated by Apache Hive™ and the first row of the file holds `(_col0, _col1, _col2, ...)`, the "Invalid Column Name" error may occur. If this error occurs, you need to use the `SET` clause to specify the column mapping.

## 7. How do I handle errors such as the error that causes the Broker Load job to run for an excessively long period of time?

View the FE log file **fe.log** and search for the ID of the load job based on the job label. Then, view the BE log file **be.INFO** and retrieve the log records of the load job based on the job ID to locate the root cause of the error.

## 8. How do I configure an Apache HDFS cluster that runs in HA mode?

If an HDFS cluster runs in high availability (HA) mode, configure it as follows:

- `dfs.nameservices`: the name of the HDFS cluster, for example, `"dfs.nameservices" = "my_ha"`.

- `dfs.ha.namenodes.xxx`: the name of the NameNode in the HDFS cluster. If you specify multiple NameNode names, separate them with commas (`,`). `xxx` is the HDFS cluster name that you have specified in `dfs.nameservices`, for example, `"dfs.ha.namenodes.my_ha" = "my_nn"`.

- `dfs.namenode.rpc-address.xxx.nn`: the RPC address of the NameNode in the HDFS cluster. `nn` is the NameNode name that you have specified in `dfs.ha.namenodes.xxx`, for example, `"dfs.namenode.rpc-address.my_ha.my_nn" = "host:port"`.

- `dfs.client.failover.proxy.provider`: the provider of the NameNode to which the client will connect. Default value: `org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider`.

Example:

```SQL
(
    "dfs.nameservices" = "my-ha",
    "dfs.ha.namenodes.my-ha" = "my-namenode1, my-namenode2",
    "dfs.namenode.rpc-address.my-ha.my-namenode1" = "nn1-host:rpc_port",
    "dfs.namenode.rpc-address.my-ha.my-namenode2" = "nn2-host:rpc_port",
    "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
)
```

The HA mode can be used with simple authentication or Kerberos authentication. For example, to use simple authentication to access an HDFS cluster that runs in HA mode, you need to specify the following configurations:

```SQL
(
    "username"="user",
    "password"="passwd",
    "dfs.nameservices" = "my-ha",
    "dfs.ha.namenodes.my-ha" = "my_namenode1, my_namenode2",
    "dfs.namenode.rpc-address.my-ha.my-namenode1" = "nn1-host:rpc_port",
    "dfs.namenode.rpc-address.my-ha.my-namenode2" = "nn2-host:rpc_port",
    "dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
)
```

You can add the configurations of the HDFS cluster to the **hdfs-site.xml** file. This way, you only need to specify the file path and authentication information when you use brokers to load data from the HDFS cluster.

## 9. How do I configure Hadoop ViewFS Federation?

Copy the ViewFs-related configuration files `core-site.xml` and `hdfs-site.xml` to the **broker/conf** directory.

If you have a custom file system, you also need to copy the file system-related **.jar** files to the **broker/lib** directory.

## 10. When I access an HDFS cluster that requires Kerberos authentication, what do I do if the "Can't get Kerberos realm" error occurs?

Check that the **/etc/krb5.conf** file is configured on all hosts on which brokers are deployed.

If the error persists, add `-Djava.security.krb5.conf:/etc/krb5.conf` to the end of the `JAVA_OPTS` variable in the broker startup script.
